Component-based, adaptive stroke-order system

ABSTRACT

An efficient and simple approach to encoding ideographic characters as sequences of input strokes or stroke categories is disclosed, wherein: each character is represented by one or more sequences of one or more components; each component corresponds to a plurality of alternative stroke sequences, each of which is associated with a probability that it will be the sequence which the user enters to specify the given component or character; and the probability associated with the user&#39;s preferred stroke sequence is automatically increased by the system when the character is selected, thus automatically adapting to a user&#39;s preferences.

BACKGROUND OF THE INVENTION

[0001] 1. Technical Field

[0002] The invention relates to a method for identifying characters whenentered as strokes. More particularly, the invention relates to acomponent-based, adaptive stroke order system for fast entry ofideographic language characters.

[0003] 2. Description of the Prior Art

[0004] For many years, portable computers have been getting smaller andsmaller. The principal size-limiting component in the effort to producea smaller portable computer has been the keyboard. If standardtypewriter-size keys are used, the portable computer must be at least aslarge as the keyboard. Miniature keyboards have been used on portablecomputers, but the miniature keyboard keys have been found to be toosmall to be easily or quickly manipulated by a user.

[0005] Incorporating a full-size keyboard in a portable computer alsohinders true portable use of the computer. Most portable computerscannot be operated without placing the computer on a flat work surfaceto allow the user to type with both hands. A user cannot easily use aportable computer while standing or moving.

[0006] Recent advances in two-way paging, cellular telephones, and otherportable wireless technologies have led to a demand for small andportable two-way messaging systems, and especially for systems which canboth send and receive electronic mail (“e-mail”).

[0007] It would therefore be advantageous to develop a keyboard forentry of text into a computer device that is both small and operablewith one hand while the user is holding the device with the other hand.Prior development work has considered use of a keyboard that has areduced number of keys. As suggested by the keypad layout of atouch-tone telephone, many of the reduced keyboards have used a 3-by-4array of keys.

[0008] Chinese, Japanese, and Korean scripts are based on ancientChinese characters which make up an ideographic language comprising morethan 50,000 characters.

[0009] The characters of an ideographic language are each composed ofsimpler, constituent parts known as components. Components are thebuilding blocks of ideographic characters and combine in certainpredetermined ways to form the characters of an ideographic language.Under current practice, a set of 214 components is used in variouscombinations to produce the characters of the Chinese language. Eachcomponent, in turn, is made up a series of specific and preciselydefined strokes. There are currently about 40 individual stroke shapesin use which, based on variations in size, require the mastery of 82strokes before practical writing skills for Chinese ideographs areobtained.

[0010] Recent work in fonts, following ISO 10646, the Unicode system,has attempted to describe ideographic characters in terms of smallerfunctional units rather than directly representing all characters ascode points in all of their forms and variations. See, for example, QinLu, Ideographic Composition Scheme and Its Applications in Chinese TextProcessing (date unknown).

[0011] The sheer size of ideographic languages presents uniquechallenges for specifying and identifying individual characters,particularly for data entry and data processing. Various schemes havebeen proposed and descriptions can be found in the literature. See, forexample, Y. Chu, Chinese/Kanji Text and Data Processing, IEEE Computer(January 1985); J. Becker, Typing Chinese, Japanese, and Korean, IEEEComputer (January 1985); R. Matsuda, Processing Information in Japanese,IEEE Computer (January 1985); R. Walters, Design of a BitmappedMultilingual Workstation, IEEE Computer (February 1990); and J. Huang,The Input and Output of Chinese and Japanese Characters, IEEE Computer(January 1985); R. Odell, System far Encoding a Collection ofIdeographic Characters, U.S. Pat. No. 5,109,352 (28 Apr. 1992); R.Thomas, H. Stohr, Symbol Definition Apparatus, U.S. Pat. No. 5,187,480(16 Feb. 1993); and B. Hu, Y. Hu, Stroke Entry Key Position Distributionand its Screen Prompts, Chinese Patent Application No. 96120693.4(Published 29 Apr. 1996).

[0012] Most of these schemes require that the user enter predefinedcodes or follow a predetermined order of entry of strokes or components.Strokes for each character must be entered in the traditional ordertaught in school. But for both native speakers and those who havelearned an ideographic language later in life, the order of strokes andcomponents is not always obvious and may be difficult to remember forinfrequently used characters. Teachers living in different parts of thecountries where the language is written may introduce variations instyle and order, and older people have developed their own ordering overthe course of decades of writing the characters by hand.

[0013] It would be advantageous therefore to provide a scheme forentering strokes and components and selecting characters that wouldallow or adapt to users' preferred ordering of those strokes orcomponents for each character.

SUMMARY OF THE INVENTION

[0014] The invention provides an efficient and simple method forentering strokes and components to select characters in ideographiclanguages and for adapting to the user's preferred ordering of strokesand components.

[0015] In a preferred embodiment of the invention, a database record ismaintained for each potential character and for the componentscomprising it, along with information about the sequence of strokescorresponding to each component. The database is searched each time astroke is entered into the system by a user. Characters with componentsthat match the sequence up to that point are prioritized based on anappropriate linguistic model. The system displays the matchingcharacters in prioritized order and allows the user to scroll throughthe displayed characters if necessary to select the desired character.Each time a character is selected, the stroke sequences for thecomponents that comprise the character are reprioritized. If a recorddoes not exist for a stroke sequence, the system may add a new record tothe database.

[0016] In the preferred embodiment of the invention, there is acorresponding ideographic description database that efficientlyrepresents each character as a set of components positioned within acharacter grid.

[0017] In another embodiment of the invention, one or more individualcharacters may be represented by strokes alone.

BRIEF DESCRIPTION OF THE DRAWINGS

[0018]FIG. 1 is a hardware block diagram of the component-based,adaptive stroke order system according of the invention;

[0019]FIG. 2 shows a table of kanji components arranged in order ofnumber of strokes need to form the component;

[0020]FIG. 3 is a flow chart of the matching algorithm for thecomponent-based, adaptive stroke order system of FIG. 1;

[0021]FIG. 4 shows an embodiment of the invention that stores a smallimage of each character's components such that a character is comprisedof a component and its position within a grid;

[0022]FIG. 5 shows a stroke entry means and display according to theinvention.

DETAILED DESCRIPTION OF THE INVENTION

[0023] The preferred embodiment as described herein is a reducedkeyboard system with a small display, such as a mobile phone. In thisembodiment, one of a small number of keys is pressed to enter a stroke.Each stroke entry key is associated with one stroke category; a strokecategory represents one or more hand-drawn strokes of similar shape orsize. The user of the system performs the mapping between the actualstroke and the corresponding stroke category in his head to determinewhich key to press. Therefore, “stroke,” “stroke category,” and “strokeentry” may be considered equivalent in describing the preferredembodiment of this invention. In addition, there may be a wildcard keyto match any stroke in case the proper stroke category cannot bedetermined by the user.

[0024] In an alternative embodiment of the system, stroke entry isperformed by means of handwriting recognition of stylus, finger, or handgestures on a touchscreen or stylus tablet. The gestures may be mappedto predefined stroke categories or they may be given a recognition scorethat is considered in the component matching algorithm.

[0025] In other embodiments of the system, the strokes may be mapped tokeys on a personal computer keyboard or to the buttons on a remotecontrol, e.g. for a set-top box.

[0026] A block diagram of the preferred embodiment is provided inFIG. 1. The keyboard 54 and the display 53 are coupled to a processor100 through appropriate interfacing circuitry. An optional speaker 102is also coupled to the processor. The processor 100 receives input fromthe keyboard, and manages all output to the display and speaker.Processor 100 is coupled to a memory 104. The memory includes acombination of temporary storage media, such as random access memory(RAM), and permanent storage media, such as read-only memory (ROM),floppy disks, hard disks, or CD-ROMs. Memory 104 contains all softwareroutines to govern system operation. Preferably, the memory contains anoperating system 106, adaptive stroke-order software 108, and associateddata structures 110. The memory also includes an ideographic descriptiondatabase 30. Optionally, the memory may contain one or more applicationprograms 112, 114. Examples of application programs include wordprocessors, software dictionaries, and foreign language translators.Speech synthesis software may also be provided as an applicationprogram, allowing the reduced keyboard system to function as acommunication aid.

[0027] A table 153 is shown in FIG. 2, and consists of 82 componentsarranged by number of strokes, i.e. 1 to 9 or more strokes, as shown inthe column at the far left side of the display. A stroke istraditionally defined to be an element of an ideographic character thatcan be drawn with one complete motion without lifting the pen from thepaper.

[0028] Rather than identifying a character as a sequence of strokes, thepreferred embodiment of the invention identifies a character as asequence of component parts. The system defines components that can beassembled into characters. Characters are represented as a combinationof one or more sets of one or more components, and each set ofcomponents may be ordered in a unique sequence. Some characters can berepresented as sets of different components and even have a differentnumber of components in each set.

[0029] In an alternative embodiment, each individual stroke may also bea component in the system, and thus a character may be represented as acombination of either strokes or components or both.

[0030] The components themselves are composed of strokes that arewritten in a certain order. For each component, a set of alternatestroke sequences is provided that corresponds to some or all of thepossible ways that a user can enter the sequence of strokes for thatcomponent. Each of these stroke sequences is optionally associated witha dynamic priority where, at system initialization, the most common orcorrect sequence is given a very high priority. Each of the otheralternate sequences is given a lower priority appropriate to theprobability of being used to enter the component.

[0031] Provision must be made for alternate versions of component strokesequences that are of different lengths; for example, for following asplit case, such as “mouth” (or “box”) which typically have the firsttwo strokes (vertical, corner) followed by some other component(s)(inside the box), followed by the closing stroke of “mouth”(horizontal); and for simple stroke misinterpretations. In oneembodiment of the invention, each component is constrained to have thesame number of strokes for each stroke sequence, and the system providestwo different component records to handle these cases. In anotherembodiment, the second half of the split case is combined with eachembedded component to create unique component records for each neededcombination.

[0032] An appropriate linguistic model represents the initial frequencyof a character relative to other characters, or the probability that theuser intends to select that character next. Frequency may be determinedby the number of occurrences of the character in written text or inconversation; by the grammar of the surrounding sentence; by itsoccurrence following the preceding character or characters; by thecontext in which the system is currently being used, such as typingnames into a phonebook application; by its repeated or recent use in thesystem (the user's own frequency or that of some other source of text);or by any combination thereof. In addition, a character may beprioritized by the probability that a matching component occurs in thecharacter at that point in the entered stroke sequence.

[0033] Characters are initially prioritized based on the linguisticmodel and displayed to the user in that order. If any strokes have beenentered, only those characters are displayed that have components withat least one stroke sequence matching the strokes entered so far.

[0034] In addition to displaying possible characters, the system mayalso display possible components, indicated with an underbar forexample. After the user selects a component, the system shows only thosecharacters that contain that component. FIG. 3 is a simplified flowdiagram showing operation of a preferred embodiment of the invention.

[0035] As the user enters strokes (200), that sequence of strokes ismatched (205) against the stroke sequence records for each component.Each possible component is identified (210) at each point in the strokesequence and weighted (215, 225) according to the current priority ofthe matching stroke sequence. If the user enters a stroke sequencecorresponding to the original default correct stroke sequence (220),there is a very high likelihood of a match and a character is output(230).

[0036] If the user enters a character by matching some sequenceincluding one or more or fairly low-priority matches (220), then thatcharacter is not identified as a very likely candidate. In the system'sinitial state, the user must enter more of the keystrokes of thatcharacter, but normally would not have to correct the strokes.Eventually, the user enters enough strokes and is able to select theintended character, even though the user chose alternative strokesequences for one or more of the components in that character. Thus, thesystem learns that the strokes that the user entered were the strokesthat this user believes are the appropriate strokes for this character.The system can then trace back and dynamically change the priorities sothat with some degree of usage, the system dynamically adjusts to theuser's concept of the correct stroke sequence for these variouscomponents. The system determines that the user is likely to use thatsame stroke sequence in any of the characters in which a particularcomponent appears.

[0037] Note that the system should not rapidly adapt to mistakes, e.g.when the user transposes two strokes accidentally. The system requiressome number of repetitions to cause an alternate order to become thepreferred order.

[0038] Thus, the invention provides an adaptive system, i.e. one thatadapts to the user's own concept of the stroke sequence without havingto be reconfigured or manually rearranged in any way. In this way, thesystem allows the user to enter strokes according to his own preference.Accordingly, the user is ultimately successful in finding the character,rather than having to backtrack and guess at the stroke sequence. Theuser may have to enter more strokes initially, but as the system adapts,the number of strokes that must be entered may be reduced toapproximately two per character.

[0039] A further aspect of the invention improves the efficiency andstorage requirements of the system. Instead of storing a large amount ofimage data, e.g. 16 bits by 16 bits for each and every character in thecharacter set, the system stores a small image of each of thecharacter's components. A character can then be described, for example,as Component X at Position 1 and Component Y at Position 2 and ComponentZ at Position 3, as shown in FIG. 4. Accordingly, this feature of theinvention defines a set of components and their position within a grid,e.g. a 16 by 16 grid. As the position of a component within a charactermay change its appearance, and there may be regional variations in how acomponent or character is drawn, the system may also store componentvariations on a per-location basis.

[0040] The characters are constructed programmatically on the screen.The image data graphically representing each component is drawn at theproper position for the character as defined in the ideographicdescription database (30).

[0041] In an alternative embodiment, a font file contains integratedcomponent and stroke data in an efficient format, so that each characterentry describes both how it is displayed and how it is entered.

[0042] The system herein disclosed is designed to be easily customizedfor any number of ideographic languages, e.g. Japanese, Korean,traditional Chinese, or simplified Chinese. The ideographic descriptiondatabase may be provided as a software module that is readily exchangedwith another module, should a different ideographic language be desired.Additionally, several such modules may be provided and the invention mayinclude a selection menu for choosing between any of the severaldatabase modules. In this way, one may have several ideographiclanguages available for use at any given time. This gives the inventiona great deal of flexibility in its implementation across a variety ofideographic languages. It is also easy to generate new characters byupdating the ideographic description database.

[0043] In FIG. 5, a sample reduced keyboard 54 is shown and consists ofkeys 55 by which the user may enter strokes during the construction of acharacter.

[0044] The display 53 is dynamically updated to show likely charactersand components upon the entry of strokes and the selection of componentsand characters. If the display is not large enough to present all ofsuch matches simultaneously, and so that user can find a character witha low-probability stroke order, a scrollbar or Page Up/Down keys may beused to scroll additional matched characters onto the display.

[0045] If the user cannot find a desired character or wants to create anew association between strokes or components and a character, otherinput methods, e.g. phonetic Pinyin, can be used to select a desiredcharacter. Alternately, the user may select the common structure of thecharacter, e.g. two components side-by-side, and even select one of thecomponent positions and specify the component for that position. By thisprocess, the user can identify the character by specifying one or moreattributes of the character.

[0046] The user may also select from one or more predefined gridarrangements to identify the kind of character. The user may also selectthe position of each component and the component for such position.

[0047] The output code produced as a result of user character selectioncan be used to input the character into an email message or other textentry field.

[0048] Although the invention is described herein with reference to thepreferred embodiment, one skilled in the art will readily appreciatethat other applications may be substituted for those set forth hereinwithout departing from the spirit and scope of the present invention.

[0049] Accordingly, the invention should only be limited by the claimsincluded below.

1. A method of entering characters in an ideographic language,comprising the steps of: maintaining a record for each one of one ormore characters, which comprises one or more sequences of one or morecomponents; further maintaining a record for each of one or morecomponents which comprises one or more sequences of entered strokes orstroke categories; entering strokes or stroke categories; comparingstrokes or stroke categories and component sequences and matching one ormore characters; and optionally displaying one or more matchedcharacters; wherein each time a character is selected, input sequencesfor components that comprise said character are reprioritized.
 2. Themethod of claim 1, further comprising the step of: prioritizingcharacters that match a stroke entry sequence according to a linguisticmodel
 3. The method of claim 2, where the linguistic model may includeone or more of: frequency of occurrence of a character in formal orconversational written text; frequency of occurrence of a character whenfollowing a preceding character or characters; proper or common grammarof the surrounding sentence; application context of current characterentry; and recency of use of repeated use of the character by the useror within an application program.
 4. The method of claim 1, furthercomprising the step of: prioritizing a character according to theprobability that a component with a matching stroke sequence occurs insaid character at that point in an entered stroke sequence.
 5. Themethod of claim 1, further comprising the step of: prioritizing eachalternate stroke sequence of a component according to the probabilitythat said stroke sequence will be entered by the user to specify saidcomponent.
 6. The method of claim 1, further comprising the step of: foreach component identified by matching all or a part of the stroke entrysequence with one of the stroke sequences associated with saidcomponent, using a priority associated with said stroke sequence toadjust the priority of the character or characters containing saidcomponent.
 7. The method of claim 6, further comprising the step of:once a character is selected, changing the associated priority of eachmatching stroke sequence of each component of said character, andoptionally changing an associated priority of each non-matching strokesequence of each component of said character.
 8. The method of claim 1,further comprising the step of: recording that a stroke entry sequencethat a user enters for a particular component via a first selectionmechanism is an appropriate stoke entry sequence for said component. 9.The method of claim 8, further comprising the step of: either ofcreating a new association between a stroke entry sequence and acomponent, and creating a new association between a component sequenceand a character.
 10. The method of claim 9, wherein a desired componentor character is specified through a second selection mechanism.
 11. Themethod of claim 1, further comprising the step of: displaying possiblecomponents, in addition to displaying one or more matched characters,wherein said components are optionally indicated with an underbar. 12.The method of claim 11, further comprising the step of: showing onlythose characters that contain a component selected by a user.
 13. Themethod of claim 1, wherein gestures are used for stroke entry.
 14. Themethod of claim 13, further comprising the step of: recognizing andmapping said gestures to predefined stroke categories.
 15. The method ofclaim 13, further comprising the step of: assigning a recognition scoreto each said gesture that is considered in a component-matchingalgorithm.
 16. The method of claim 1, wherein an ideographic characterdescription database is provided for describing the appearance ofcharacters and component objects.
 17. The method of claim 16, furthercomprising the step of: defining a character set of components andcomponent positions within a grid.
 18. The method of claim 17, furthercomprising the step of: defining a character as a set of components andcomponent positions within a grid.
 19. The method of claim 18, furthercomprising the step of: creating a new association between a character,a set of components, and a set of component positions.
 20. The method ofclaim 18, further comprising the step of: allowing the user to selectfrom one or more predefined grid arrangements to identify the kind ofcharacter.
 21. The method of claim 18, further comprising the step of:allowing the user to select the position of each component and thecomponent for said position.
 22. The method of claim 1, wherein inputsequences identifying a multi-part component are represented by at leastone of a set of linked records or by separate component records.
 23. Themethod of claim 1, wherein an object is added to a memory if an objectdoes not exist for an input sequence.
 24. The method of claim 1, whereinone of the plurality of inputs is associated with a special wildcardinput that is associated with any or all stroke or stroke category. 25.The method of claim 18, further comprising the step of: displaying eachdisplayable character as a set of objects positioned within a charactergrid.
 26. The method of claim 1, wherein components are displayed ascharacter interpretations of identified objects; and wherein after acomponent is selected only characters containing said selectedcomponents are further displayed or reprioritized.
 27. A system forselecting strokes to select characters in an ideographic language,comprising: a user input device having a plurality of inputs, each ofsaid plurality of inputs being associated with a plurality of userstrokes that make up a character or its component parts, an inputsequence being generated each time an input is selected by manipulatingthe user input device, wherein a generated sequence corresponds to asequence of inputs that have been selected; a memory containing aplurality of objects, comprising character objects that are ideographiccharacters and component objects that comprise components correspondingto the components of a character, each of the component objects beingassociated with one or more input sequences, wherein each of the one ormore input sequences associated with a component object is optionallyassociated with a dynamic priority; an output device to provide systemoutput to the user; and a processor coupled to the user input device,memory, and output device, the processor identifying from the pluralityof objects contained in the memory any object associated with eachgenerated input sequence, and optionally generating output signalscausing the output device to provide the user any object or objects ascharacter interpretations of the entered input sequence.
 28. The systemof claim 27, wherein order of said priorities is initially based on anappropriate linguistic model.
 29. The system of claim 27, wherein anobject having a highest priority is automatically selected.
 30. Anideographic language text input system comprising: a user input devicecomprising: a plurality of inputs, each of the plurality of inputsassociated with a stroke or stroke category, an input sequence beinggenerated each time an input is selected by manipulating the user inputdevice, wherein a generated input sequence corresponds to a sequence ofinputs that have been selected; and at least one selection input forgenerating an object output, wherein a stoke input sequence isterminated when the user manipulates the user input device to aselection input; a memory containing a plurality of objects, whereineach of the plurality of objects is associated with an input sequence; adisplay to depict system output to the user; and a processor coupled tothe user input device, memory, and display, said processor comprising:an identifying means for identifying form the plurality of objects inthe memory any object associated with each generated input sequence; anoutput means for displaying on the display the character interpretationof any identified objects associated with each generated input sequence;and a selection means for selecting the desired character for entry intoa text entry display location upon detecting the manipulation of theuser input device to a selection input; wherein each time a character isselected, input sequences for components that comprise said characterare reprioritized.
 31. The system of claim 30, wherein said selectionmeans selects a desired character based upon identification of objectshaving a highest priority based on a linguistic model.
 32. The system ofclaim 30, wherein input sequences identifying a multi-part component arerepresented by at least one of a set of linked records or by separatecomponent records.
 33. The system of claim 30, wherein an object isadded to a memory if an object does not exist for an input sequence. 34.The system of claim 30, wherein one of the inputs is associated with aspecial wildcard input that is associated with any or all strokes orcategories.
 35. The system of claim 30, wherein an ideographic characterdescription database is provided for describing the appearance ofcharacters and component objects.
 36. The system of claim 35, whereineach displayable character is represented as a set of objects positionedwithin a character grid.
 37. The system of claim 30, wherein componentsare displayed as character interpretations of identified objects; andwherein after a component is selected only characters containing saidselected component are further displayed or reprioritized.